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Abstract 

We show that for any odd k and any instance T of the Max-/cXOR constraint satisfac¬ 
tion problem, there is an efficient algorithm that finds an assignment satisfying at least a 
2 + 0(1 /\/ Dj fraction of h's constraints, where D is a bound on the number of constraints 
that each variable occurs in. This improves both qualitatively and quantitatively on the recent 
work of Farhi, Goldstone, and Gutmann (2014), which gave a quantum algorithm to find an 
assignment satisfying a j + Q(D 3/4 ) fraction of the equations. 

For arbitrary constraint satisfaction problems, we give a similar result for "triangle-free" 
instances; i.e., an efficient algorithm that finds an assignment satisfying at least a it + 0(1 / VD) 
fraction of constraints, where }i is the fraction that would be satisfied by a uniformly random 
assignment. 
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1 Introduction 


An instance of a Boolean constraint satisfaction problem (CSP) over n variables X\,...,x n is a 
collection of constraints, each of which is some predicate P applied to a constant number of the 
variables. The computational task is to find an assignment to the variables that maximizes the 
number of satisfied predicates. In general the constraint predicates do not need to be of the same 
"form"; however, it is common to study CSPs where this is the case. Typical examples include: 
Max-JcSAT, where each predicate is the OR of k variables or their negations; Max-kXOR, where 
each predicate is the XOR of exactly k variables or their negations; and Max-Cut, the special case 
of Max-2XOR in which each constraint is of the form X; Xj. The case of Max-kXOR is particu¬ 
larly mathematically natural, as it is equivalent to maximizing a homogenous degree-/: multilinear 
polynomial over {±1}". 

Given a CSP instance, it is easy to compute the expected fraction }t of constraints satisfied by 
a uniformly random assignment; e.g., in the case of Max-/cXOR we always have }i = Thus 
the question of algorithmic interest is to find an assignment that satisfies noticeably more than 
a }i fraction of constraints. Of course, sometimes this is simply not possible; e.g., for Max-Cut 
on the complete n-variable graph, at most a \ + 0(1/n) fraction of constraints can be satisfied. 1 
However, even when all or almost all constraints can be satisfied, it may still be algorithmically 
difficult to beat pi. For example, Hastad [HasOl] famously proved that for every e > 0, given a 
Max-3XOR instance in which a 1 — £ fraction of constraints can be satisfied, it is NP-hard to find an 
assignment satisfying a \ + £ fraction of the constraints. Hastad showed similar "approximation 
resistance" results for Max-3Sat and several other kinds of CSPs. 

One possible reaction to these results is to consider subconstant £. For example, Hastad 
and Venkatesh [HV04] showed that for every Max-/cXOR instance with m constraints, one can 
efficiently find an assignment satisfying at least a \ + 0(1/^/m) fraction of them. 2 (Here, 
and elsewhere in this introduction, the O(-) hides a dependence on k, typically exponential.) 
Relatedly, Khot and Naor [KN08] give an efficient algorithm for Max-3XOR that satisfies a 
2 + 0(£y / (log n)/n) fraction of constraints whenever the optimum fraction is \ + £. 

Another reaction to approximation resistance is to consider restricted instances. One com¬ 
monly studied restriction is to assume that each variable's "degree" — i.e., the number of con¬ 
straints in which it occurs — is bounded by some D. Hastad [HasOO] showed that such instances 
are never approximation resistant. More precisely, he showed that for, say. Max-/:XOR, one can 
always efficiently find an assignment satisfying at least a fi + Q(l/D) fraction of constraints. 3 
Note that this advantage of 0(1/D) cannot in general be improved, as the case of Max-Cut on the 
complete graph shows. 

One may also consider further structural restrictions on instances. One such restriction is that 
the underlying constraint hypergraph be triangle-free (see Section 2 for a precise definition). For 
example. Shearer [She92] showed that for triangle-free graphs there is an efficient algorithm for 
finding a cut of size at least ^ + 0(1) • XX/ y^deg(7), where deg(z') is the degree of the zth vertex. As 
XXi Y / deg(/) > I],- i ^Tjy = in zzz-edge degree-D bounded graphs, this shows that for triangle-free 
Max-Cut one can efficiently satisfy at least a \ + 0(1/ \/D) fraction of constraints. Related results 

Another trivial example is the Max-2XOR instance with the two constraints x = y and x ^ y. For this reason we 
always assume that our Max-fcXOR instances do not contain a constraint and its negation. 

2 In [HV04] this is stated as an approximation-ratio guarantee: if the optimum fraction is \ + e then \ + 0(e/ \fm) is 
guaranteed. However inspecting their proof yields the absolute statement we have made. 

3 The previous footnote applies also to this result. 
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have also been shown for degree-bounded instances of Maximum Acyclic Subgraph [BS90], Min- 
Bisection [Alo97] and Ordering k -CSPs [GZ12, Makl3]. 

1.1 Recent developments and our work 

In a recent surprising development, Farhi, Goldstone, and Gutmann [FGG14] gave an efficient 
quantum algorithm that, for Max-3XOR instances with degree bound D, finds an assignment sat¬ 
isfying a \ + Q(D 3,/4 ) fraction of the constraints. In addition, Farhi et al. show that if the Max- 
3XOR instance is "triangle-free" then an efficient quantum algorithm can satisfy a \ + 0(1/\/D) 
fraction of the constraints. 

Farhi et al.'s result was perhaps the first example of a quantum algorithm providing a bet¬ 
ter CSP approximation guarantee than that of the best known classical algorithm (namely Has- 
tad's [HasOO], for Max-3XOR). As such it attracted quite some attention. 4 In this paper we show 
that classical algorithms can match, and in fact outperform, Farhi et al.'s quantum algorithm. 


First result: Max-kXOR. We will present two results. The first result is about instances of Max- 
kXOR. 


Theorem 1.1. There is a constant c = exp(— 0(k)) and a randomized algorithm running in time 
poly(wt, H,exp(k)) that, given an instance A of Max-kXOR with m constraints and degree at most D, 
finds zvith high probability an assigmnent x E {±1}” such that 


val^{x ) 



( 1 . 1 ) 


Here val^(x) denotes the fraction of constraints satisfied by x. hi particular, for odd k, by trying the 
assignment and its negation, the algorithm can output an x satisfying 

valstx) > ^ + -j= . ( 1 . 2 ) 

In Section 3 we give a simple, self-contained proof of Theorem 1.1 in the special case of Max- 
3XOR. For higher k we obtain it from a more general result (Theorem 4.2) that gives a constructive 
version of a theorem of Dinur, Friedgut, Kindler and O'Donnell [DFKO07], This result shows how 
to attain a significant deviation from the random assignment value for multivariate low-degree 
polynomials with low influences. See Section 4. 

We note that the deviation 0(1/ D) in (1.1) is optimal. To see why, consider any D-regular 
graph on n vertices, and construct a Max-2XOR instance A as follows. For every edge ( i,j ) in 
the graph we randomly and independently include either the constraint Xj = Xj or x, f Xy. For 
every fixed x, the quantity valcj(x) has distribution A Binomial (m, 1 j, where m = 'ff. Hence 
a Chernoff-and-union-bound argument shows that with high probability all 2" assignments will 
have |vakj(x) — \\ < Ofsfnjm) = 0(1/\/D). This can easily be extended to Max-kXOR for k > 2. 


General CSPs. As noted earlier, the case of Max-Cut on the complete graph shows that for 
general CSPs, and in particular for Max-2XOR, we cannot guarantee a positive advantage of 
0(1/ \fjj) as in (1.2). In fact, a positive advantage of 0(1/0) is the best possible, showing that 
the guarantee of Hastad [HasOO] is tight in general. 

4 As evidenced by the long list of authors on this paper; see also http: //www. scottaaronson. com/blog/?p=2155. 
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A similar example can be shown for Max-2SAT: consider an instance with D 2 variables and 
imagine them placed on a D x D grid. For any two variables in the same row add the constraint 
x V y and for any two variables in the same column add the constraint x V y. Then each variable 
participates in O(D) clauses, and it can be verified that the best assignment satisfies 3/4 + 0(l/D) 
fraction of the clauses. We do not know if the same holds for Max-3SAT and we leave that as an 
open question. 

Sometimes no advantage over random is possible. For instance, consider the following in¬ 
stance with 8 clauses on 6 variables, in which any assignment satisfies exactly 1 /2 of the clauses: 

{NAE(xi,x 2 ,x 3 ), 

AE(yi, x 2 , x 3 ), AE (x v y 2 , x 3 ), AE(x lr x 2 , y 3 ), 

NAE (ii, y 2 ,1/3), NAE(i/i, i 2 , 1 / 3 ), NAE(i/i, j/ 2; r 3 ), 

AE(yi,y 2 ,y 3 )} , 

where NAE denotes the "not all equal" constraint, and AE is the "all equal" constraint. 


Second result: triangle-free instances of general CSPs. Despite the above examples, our sec¬ 
ond result shows that it is possible to recover the optimal advantage of 1 / \JD for triangle-free 
instances of any CSP: 


Theorem 1.2. There is a constant c = exp(— 0(k)) and a randomized algorithm running in time 
poly (m, n, exp(. k )) time that, given a triangle-free, degree-D CSP instance A with m arbitrary constraints, 
each ofarity between 2 and k,finds with high probability an assignment x E {±1}” such that 


valc$(x ) > p + 


c 

7 D' 


Here p is the fraction of constraints in A that ivould be satisfied in expectation by a random assignment. 


This theorem is proved in Section 5. For simplicity, we state our results as achieving randomized 
algorithms and leave the question of derandomizing them (e.g., by replacing true random bits 
with 0(A)-wise independence or some other such distribution) to future work. 


1.2 Overview of our techniques 

All three algorithms that we present in this work follow the same broad outline, while the details 
are different in each case. To produce an assignment that beats a random assignment, the idea is to 
partition the variables in to two sets (F, G) with F standing for 'Fixed' and G standing for 'Greedy' 
(in Section 4, these correspond to [n] \ U and U respectively). The variables in F are assigned 
independent and uniform random bits and the variables in G are assigned values greedily based 
on the values already assigned to F. We will refer to constraints with exactly one variable from G 
as active constraints. The design of the greedy assignments and their analysis is driven by two key 
objectives. 

1. Obtain a significant advantage over the random assignment on active constraints. 

2. Achieve a value that is at least as good as the random assignment on inactive constraints. 
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The simplest example is the algorithm for Max-3XOR that we present in Section 3. First, we ap¬ 
peal to a decoupling trick due to Khot-Naor [KN08] to give an efficient approximation-preserving 
reduction from an arbitrary instance $3 of Max-3XOR to a bipartite instance Specifically, the 
instance S will contain two sets of variables {yi}i e [ n \ ar >d {z ; }, >.= [„], with every constraint hav¬ 
ing exactly one variable from {y;}; 6 [„] and two variables from {z,}^ Notice that if we set 
G = {yi}ie[n]> then objective (2) holds vacuously, i.e., every constraint in 3 is active. The former 
objective (1) is achieved as a direct consequence of anticoncentration of low degree polynomials 
(see Fact 2.3). In the case of Max-AXOR, the second objective is achieved by slightly modifying the 
greedy assignment: we flip each of the assignments for the greedy variables with a small proba¬ 
bility // (that corresponds to one of the extrema of the degree-A Chebyshev polynomials of the first 
kind). 

Our algorithm for triangle-free instances begins by picking (F, G) to be a random partition of 
the variables. In this case, after fixing a random assignment to F, a natural greedy strategy would 
proceed as follows: Assign each variable in G a value that satisfies the maximum the number of 
its own active constraints. 

In order to achieve objective (2), it is sufficient if for each inactive constraint its variables are 
assigned independently and uniformly at random. Since the instance is triangle-free, for every 
pair of variables X{, Xj G G the active constraints of x, and Xj are over disjoint sets of variables. 
This implies that the greedy assignments for variables within each inactive constraint are already 
independent. Unfortunately, the greedy assignment as defined above could possibly be biased, 
and in general much worse than a random assignment on the inactive constraints. We overcome 
this technical hurdle by using a modified greedy strategy defined as follows. Assign —1 to all 
variables in G and then for each variable X; G G, consider the change in the number of active 
constraints satisfied if we flip x, from —1 to 1. The algorithm will flip the value only if this number 
exceeds an appropriately chosen threshold 0,. The threshold 0/ is chosen so as to ensure that over 
all choices of values to F, the assignment to x, is unbiased. Triangle-freeness implies that these 
assignments are independent within each inactive constraint. Putting these ideas together, we 
obtain the algorithm for triangle-free instances discussed in Section 5. 

2 Preliminaries 

Constraint satisfaction problems. We will be considering a somewhat general form of constraint 
satisfaction problems. An instance for us will consist of n Boolean variables and in constraints. We 
call the variables x\,... ,x n , and we henceforth think of them as taking the Boolean values ±1. 
Each constraint is a pair (Pi, Si) (for i G [m]) where Pt : {±1 } r -x {0,1} is the predicate, and St 
is the scope, an ordered r-tuple of distinct coordinates from [n]. The associated constraint is that 
Pe(xs e ) = 1/ where we use the notation x$ to denote variables x restricted to coordinates S. We 
always assume (without loss of generality) that Pt depends on all r coordinates. The number r is 
called the arity of the constraint, and throughout this paper A will denote an upper bound on the 
arity of all constraints. Typically we think of A: as a small constant. 

We are also interested in the special case of Max-AXOR. By this we mean the case when all 
constraints are XORs of exactly k variables or their negations; in other words, when every Pt is of 
the form Pt(x i,...,X/ C ) = \±\x 1 X 2 • • • x^. When discussing Max-AXOR we will also always make 
the assumption that all scopes are distinct as sets; i.e., we don't have the same constraint or its 
negation more than once. 
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Figure 1: The two forbidden configurations for triangle-free instances 

Hypergraph structure. We will be particularly interested in the degree deg(z) of each variable x\ 
in an instance. This is simply the number of constraints in which x, participates; i.e., #{£ : Sc 3 z}. 
Throughout this work, we let D denote an upper bound on the degree of all variables. 

For our second theorem, we will need to define the notion of "triangle-freeness". 

Definition 2.1. We say that an instance is triangle-free if the scopes of any two distinct constraints 
intersect on at most one variable ("no overlapping constraints") and, moreover, there are no three 
distinct constraints any two of whose scopes intersect ("no hyper-triangles"), see Figure 1. 


Fourier representation. We recall that any Boolean function / : {±1}” —> R can be represented 
by a multilinear polynomial, or Fourier expansion, 

f( x )= D f(s)x s , where x s = f ]^[x,. 

Sc[n] ZeS 

For more details see, e.g., [0'D14]; we recall here just a few facts we'll need. First, E[f(x)] = 
/(0). (Here and throughout we use boldface for random variables; furthermore, unless otherwise 
specified x refers to a uniformly random Boolean string.) Second, Parseval's identity is WfWi = 
E[/(x) 2 ] = £ s /(S) 2 , from which it follows that Var[/(x)] = £ s _^ 0 /(S) 2 . Third, 

Inf z[/] = E/( s ) 2 = E [( 3 z/)(*) 2 ]/ 

S3i 

where 9,/ is the derivative of / with respect to the zth coordinate. This can be defined by the 
factorization /(x) = Xj ■ (9,/) (x') + g(x'), where x' = (xi,..., x,_i, x ; - + i,..., x„), or equivalently 
by 9 if(x') = /( x ' +1 ) 2 /( x >^ 1 ) , where here ( x',b ) denotes (xi,...,x,-_i,fo,x, + i, ...,x n ). We record 
here a simple fact about these derivatives: 

Lemma 2.2. For any predicate P : {±l} r —t {0,1}, r > 2, we have Var[(9,P)(x)] > Q(2 -r ) for all i. 

Proof The function c),P (x) takes values in {- (,0, ( }. It cannot be constantly 0, since we assume P 
depends on its zth input. It also cannot be constantly |, else we would have P(x) = \ + \xj and 
so P would not depend on all r > 2 coordinates. Similarly it cannot be constantly — (. Thus 9,-P(x) 
is nonconstant, so its variance is 0(2 □ 

Given an instance and an assignment x G {±1}”, the number of constraints satisfied by the 
assignment is simply ^ Pe(xs e )- This can be thought of as a multilinear polynomial {±1}” —> R 
of degree 5 at most k. We would like to make two minor adjustments to it, for simplicity. First, we 

5 We have the usual unfortunate terminology clash; here we mean degree as a polynomial. 
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will normalize it by a factor of ^ so as to obtain the fraction of satisfied constraints. Second, we 
will replace Pi with Pi, defined by 

P e = P e - E [P e \ =P t - P e ((Z>). 

In this way, Pi(xg ( ) represents the advantage over a random assignment. Thus given an instance, 
we define the associated polynomial ^(x) by 

-i m 

VM = 

rn (=1 

This is a polynomial of degree at most k whose value on an assignment x represents the advantage 
obtained over a random assignment in terms of the fraction of constraints satisfied. In general, the 
algorithms in this paper are designed to find assignments x E {±1}” with '}3(x) > Q(^). 


Low-degree polynomials often achieve their expectation. Our proofs will frequently rely on 
the following fundamental fact from Fourier analysis, whose proof depends on the well-known 
"hypercontractive inequality". A proof of this fact appears in, e.g., [0'D14, Theorem 9.24], 

Fact 2.3. Let f : {±1}” —> R be a multilinear polynomial of degree at most k. Then P \f(x) > E[f]] > 
\ exp(—2 k). In particular, by applying this to f 2 , which has degree at most 2k, zve get 


which implies that 


P 


I/Ml > 


> exp(— 0(k)) 


E 


I/Ml 


> exp (-O(k)) ■ ||/||2 > exp(-O(k)) • stddev[/(x)] . 


3 A simple proof for Max-3XOR 


We begin by proving Theorem 1.1 in the case of Max-3XOR, as the proof can be somewhat stream¬ 
lined in this case. Given an instance of Max-3XOR we have the corresponding polynomial 

VM = E y(S)x s = Yj a ijk XiXjX k , 

|S|=3 i,j,ke[n\ 

where^(S) E {±^,0} depending on whether the corresponding constraint exists in the instance, 
and where we have introduced ay k = g ty({i,j,k}) for i,j,k E [n] distinct. We now use the trick of 
"decoupling" the first coordinate (cf. [KN08, Lem. 2.1]); i.e., our algorithm will consider ']3(y, z) = 
Zi,!,k a ijk}/iZjZ k , where t/i,..., y n , Z\,... ,z„ are new variables. The algorithm will ultimately produce 
a good assignment y,z E {±1}” for Tf. Then it will define an assignment x E {±1}" by using one 
of three "randomized rounding" schemes: 


W.p. g, Xi = 


v> 


w.i 


1 

2 

w.p. \ 


Vi; 


w.p, 


Xi = 


Vi 


~Zi 


w.p. \ 

1 
2 


Vi; 


w.p. 


w.p. Xi = -xji Vi. 


We have that E[^(x)] is equal to 


4 

9 


Xj ®ijk(. 
i,j,k 


Vi+Zi 

2 


)G?)( 


Vk+Zk 

2 


) + ^E 

i,i,k 


at 




'Vk- 


A ) + 9 E a ijk{ yi){ yj){ yk) 

i,j,k 
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(3.1) 


= 11 ] a ij k (yi z jZ k + ZilJjZ k + ZiZjX/k) 
i,j,k 


^P(y/z)- 


Thus in expectation, the algorithm obtains an assignment for ^ achieving at least | of what it 
achieves for '}3. 

Let us now write <P(y,z) = Yli i/,C, (z), where G,(z) = J^j,k a ijkZjZ k . It suffices for the algorithm 
to find an assignment for z such that YLi |G,(z)| is large, as it can then achieve this quantity by 
taking y, = sgn(G;(z)). The algorithm simply chooses z E {±1}” uniformly at random. By 
Parseval we have E[G/(z) 2 ] = Yjj<k{^- a ijk) 2 — \ Inf, ['k'J for each i. Applying Fact 2.3 (with k = 2) 
we therefore get E[|G,(z) |] > 0(1) ■ \/lnf;7p]. Since Inf,[/p] = deg(z)/4m 2 , we conclude 


E 


E|G/(z)| 


>o(i)-E 

i 


\J deg(i) 

m 


> o(i) -E 

i 


deg(t) 
mvT) 


= 0 ( 1 ). 


1 
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As YLi |G,(z)| is bounded by 1/2, Markov's inequality implies that the algorithm can with high 
probability find a z achieving |G,(z)| > O( ) after O( \f'D ) trials of z. As stated, the algo¬ 

rithm then chooses y appropriately to attain 'p (y, z) > O(-A), and finally gets \ of this value (in 
expectation) for ^(x). 


Derandomization. It is easy to efficiently derandomize the above algorithm. The main step is 
to recognize that "(2,4)-hypercontractivity" is all that's needed for Fact 2.3 (perhaps with a worse 
constant); thus it holds even when the random bits are merely 4-wise independent. This is well 
known, but we could not find an explicit reference; hence we give the proof in the case when 
/ is homogeneous of degree 2 (the case that's needed in the above algorithm). Without loss of 
generality we may assume E [/(*)] = 0 and E[f(x) 2 } = 1. Then it's a simple exercise to check that 
E[f (x) 4 ] < 15, and this only requires the bits of x to be 4-wise independent. But now 

P[ f(x) > 0] = E[l {/W > 0} ] > E[.13/(x) + .06 f(x) 2 - .002/(%) 4 ] > .06 - .002 • 15 = .03 

where we used the elementary fact l{f>o} > .13f + .061 2 — .002f 4 for all t G IR. Thus indeed the 
algorithm can find a z achieving ^ |G,(z)| > Q ( ) by enumerating all strings in a 4-wise inde¬ 
pendent set; it is well known this can be done in polynomial time. Following this, the algorithm 
chooses string y deterministically. Finally, it is clear that each of the three different randomized 
rounding schemes only requires 3-wise independence, and a deterministic algorithm can simply 
try all three and choose the best one. 


4 A general result for bounded-influence functions 

One can obtain our Theorem 1.1 for higher odd k by generalizing the proof in the preceding sec¬ 
tion. Constructing the appropriate "randomized rounding" scheme to decouple the first vari¬ 
able becomes slightly more tricky, but one can obtain the identity analogous to (3.1) through 
the use of Chebyshev polynomials. At this point the solution becomes very reminiscent of the 
Dinur et al. [DFKO07] work. Hence in this section we will simply directly describe how one can 
make [DFKO07] algorithmic. 

The main goal of [DFKO07] was to understand the "Fourier tails" of bounded degree-k poly¬ 
nomials. One of their key technical results was the following theorem, showing that if a degree-fc 
polynomial has all of its influences small, it must deviate significantly from its mean with notice¬ 
able probability: 
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Theorem 4.1. ([DFKO07, Theorem 3].) There is a universal constant C such that the following holds. 
Suppose g : {±1}” —» R is a polynomial of degree at most k arid assume Var[g] = 1. Let t > 1 and 
suppose that Inf;[g] < C~ k t~ 2 for all i G [n]. Then 

P[|g(%)| > t] > exp(— Ct 2 k 2 \ogk). 


In the context of Max-kXOR, this theorem already nearly proves our Theorem 1.1. The reason 
is that in this context, the associated polynomial *}J(x) is given by 




^ m 

D n x i' where b e e {-l, l}. 

Zm 1=1 jeSe 


Hence Var^] = 1 /4m and Inf,['}3] = deg(x,-)/4ftz 2 < D/4m 2 . Taking g = 2g'm ■and t = 
exp(— 0(k)) ■ \Jm/D, Theorem 4.1 immediately implies that 


P 


|V(*)I > ex P(-°(^)) ’ fj= > exp(—0(m/D)). 


(4.1) 


This already shows the desired existential result, that there exists an assignment beating the ran¬ 
dom assignment by exp (-O(k)) ■ —rjj. The only difficulty is that the low probability bound in (4.1) 
does not imply we can find such an assignment efficiently. 

However this difficulty really only arises because [DFKO07] had different goals. In their work, 
it was essential to show that g achieves a slightly large value on a completely random input. 5 
By contrast, we are at liberty to show g achieves a large value however we like — semi-randomly, 
greedily — so long as our method is algorithmic. That is precisely what we do in this section of the 
paper. Indeed, in order to "constructivize" [DFKO07], the only fundamental adjustment we need 
to make is at the beginning of the proof of their Lemma 1.3: when they argue that "P[| l(x) \ > t'] > 
exp(—0(t /2 )) for the degree-1 polynomial £(x)”, we can simply greedily choose an assignment x 
with \£{x)\ > t'. 

Our constructive version of Theorem 4.1 follows. It directly implies our Theorem 1.1, as de¬ 
scribed above. 


Theorem 4.2. There is a universal constant C and a randomized algorithm such that the following holds. 
Let g : {±1}” —> R be a polynomial with degree at most k and Var[y] = 1 be given. Let t > 1 and suppose 
that Inf, [y] < C~ k t~ 2 for all i G [n\. Then with high probability the algorithm outputs an assignment x 
with |,y(x)| > t. The running time of the algorithm is poly(m,n,exp(k)), zvhere m is the number of 
nonzero monomials in g7 

The algorithm AdvRand achieving Theorem 4.2 is given below. It is derived directly 
from [DFKO07], and succeeds with probability that is inverse polynomial in ft.The success prob¬ 
ability is then boosted by running the algorithm multiple times. We remark that i/ Q k \ i][ k \ ..., rj ® 
denote the k +1 extrema in [—1,1] of the kth Chebyshev polynomial of the first kind T/,(x), and are 
given by ?/ . = cos (jjz/k) for 0 <]<k. We now describe the algorithm below, for completeness. 

In the rest of the section, we will assume without loss of generality that k is odd (for even k, we 
just think of the polynomial as being of degree k + 1, with the degree (k + 1) part being 0). 

6 A1so, their efforts were exclusively focused on the parameter k, with quantitative dependencies on t not mattering. 
Our focus is essentially the opposite. 

7 For simplicity in our algorithm, we assume that exact real arithmetic can be performed efficiently. 
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AdvRand: Algorithm for Advantage over Average for degree k polynomials 

Input: a degree fc-function g 
Output: an assignment a 

1. Let 1 < s < log 2 k be a scale such that the mass (i.e., sum of squares of coefficients) of the 
Fourier transform of g on levels between 2 s 1 and 2 s is at least 1 / log k. 

2. For every i e H put i in set U with probability 2 s . For every i £ U, set ^ g {-1,1} 
uniformly at random and let y be the assignment restricted to the variables in [n] \ U. 

3. Letgy be the restriction obtained. For every; G U, set X; = sign (g y ({;'})). 

4. Pick r G {0,1 ,k} uniformly at random, and let // = t]r /2. 

5. For each coordinate; G U, flip Xj independently at random with probability (1 — ;/)/2. 

6. Output x. 


We now give the analysis of the algorithm, following [DFKO07], The second step of the al¬ 
gorithm performs a random restriction, that ensures that g XJ has a lot of mass on the first-order 
Fourier coefficients. The key lemma (that follows from the proof of Lemma 1.3 and Lemma 4.1 in 
[DFKO07]) shows that we can find an assignment that obtains a large value for a polynomial with 
sufficient "smeared" mass on the first-order Fourier coefficients. 

Lemma 4.3. Suppose g : {±1} N —» IR has degree at most k, t > 1, and &6[N]ls({0)l — 2t (^ + l)- 
Then a randomized polynomial time algorithm outputs a distribution over assignments x G {— 1,1} N such 
that 


P[|g(V)| > f] > exp(-0(fc)). 


The algorithm proving Lemma 4.3 corresponds to Steps (3-6) of the Algorithm AdvRand. 

Proof. We sketch the proof, highlighting the differences to Lemma 1.3 of [DFKO07], First we ob¬ 
serve that by picking the assignment x* = sign(y({/})), we can maximize the linear portion as 


E £({0)** = E m})\ >2t(* + i). 


ie[N] 


ie[N] 


From this point on, we follow the proof of Lemma 1.3 in [DFKO07] with their initial point Xq being 
set to x*. Let z <r-^ {± 1} :V be a random string generated by independently setting each coordinate 
zj = — 1 with probability (1 — ?/) /2 (as in step 5 of the algorithm), and let 



Lemma 1.3 of [DFKO07], by considering {T 1J g)(x*) as a polynomial in ;/ and using the extremal 
properties of Chebyshev polynomials (Corollary 2.8 in [DFKO07]), shows that there exists t] G 




(4.2) 
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Consider g(x* ■ z) as a polynomial in z, with degree at most k. As in [DFKO07], we will now 
use the hypercontractivity to give a lower bound on the probability (over random z) that |g(x* ■ z) | 
exceeds the expectation. Note that our choice of rj G [— |] and hence the bias is in the interval 

[|]. Using Lemma 2.5 in [DFKO07] (the analogue of Fact 2.3 for biased measures), it follows that 


P 

2 


\g(x*-z)\ >t}> \ exp(— 2 ft:). 


Hence when x is picked according to T>, with probability at least l / (k + 1) the algorithm chooses 
an 7] such that (4.2) holds, and then a random z succeeds with probability exp(— 0(k)), thereby 
giving the required success probability. □ 


We now sketch the proof of the constructive version of Theorem 3 in [DFKO07], highlighting 
why algorithm AdvRand works. 

Proof of Theorem 4.2. The scale s is chosen such that the Fourier coefficients of g of order [2 s- 1 ,2 s ] 
have mass at least 1 / log k. The algorithm picks set U randomly by choosing each variable with 
probability 2 s , and g y is the restriction of g to the coordinates in U obtained by setting the other 
variables randomly to y G { — 1,lj-F^W. 

Let 7 i = Lsnu={i}g( s ) 2 - Fixing U and y, let the indices T = {i G U : g y ({i }) 2 < (2e) 2k j ,•}. 
The proof of Theorem 3 in [DFKO07] shows that a constant fraction of the first order Fourier 
coefficients are large; in particular after Steps 1 and 2 of the algorithm. 


P 

U,y 


X]fy(0'}) 2 > 

i&T 


1 

100 log k. 


> exp(— 0(k)) . 


(4.3) 


Further, for i G T, we have |y y ({z})| < (2e) k ^/yj < (2e) k -^/Infj(g). Hence, when the above event 
in (4.3) is satisfied we have 


Efe«0)l > 

ieu 


1 

max feT |y y ({z'})| 




(2e) k y / max i lnf(g) 


1 

100 log k 


> 2t(k +1). 


Hence, applying Lemma 4.3 with g y we get that 


> exp(-0(fc)), (4.4) 

where V is the distribution over assignments x output by the algorithm. Repeating this algorithm 
exp( 0 (fc)) times, we get the required high probability of success. □ 


P 

xgT> 


ls(*)l > 1 


5 Triangle-free instances 

In this section we present the proof of Theorem 1.2, which gives an efficient algorithm for beating 
the random assignment in the case of arbitrary triangle-free CSPs (recall Definition 2.1). We now 
restate Theorem 1.2 and give its proof. As in the proof of Theorem 4.2, we can easily move from 
an expectation guarantee to a high probability guarantee by first applying Markov's inequality, 
and then repeating the algorithm exp(k) poly (n,m) times; hence we will prove the expectation 
guarantee here. 
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Theorem 5.1. There is a po\y(m,n,exp(k))-time randomized algorithm with the following guarantee. 
Let the input he a triangle-free instance over n Boolean variables, with m arbitrary constraints each of 
arity between 2 and k. Assume that each variable participates in at most D constraints. Let the associated 
polynomial be <p(x). Then the algorithm outputs an assignment x £ {±1}” with 

EPPW] > exp{—O(fc)) ■ £ LLSh > exp(-0(*)) • -L 

i =i m vo 

Proof. Let ( F,G ) be a partition of [n], with F standing for "Fixed" and G standing for "Greedy". 
Eventually the algorithm will choose the partition randomly, but for now we treat it as fixed. We 
will write the two parts of the algorithm's random assignment x as (xp, Xq). The bits xp will first 
be chosen independently and uniformly at random. Then the bits xq will be chosen in a careful 
way which will make them uniformly random, but not completely independent. 

To make this more precise, define a constraint (Pi, Si) to be active if its scope S/, contains exactly 
one coordinate from G. Let us partition these active constraints into groups 

Nj = {£ : (Pi, Si) is active and Si 3 j}, j £ G. 

For each coordinate j £ G, we'll define Aj C F to be the union of all active scopes involving j (but 
excluding j itself); i.e., 

A j = \J{S e \{j}:£eN j }. 

This set Aj may be empty. Our algorithm's choice of xq will have the following property: 

V/ G G, the distribution of Xj is uniformly random, and it depends only on (x,- : i £ Aj). (t) 

From property (t) we may derive: 

Claim 5.1.1. For every inactive constraint (Pi, Si), the random assignment bits x$ e are uniform mid 

independent. 

Proof of Claim. First consider the coordinates j £ Sid G. By the property (t), each such Xj depends 
only on (x ; - \ i £ Aj)-, further, these sets Aj are disjoint precisely because of the "no hyper-triangles" 
part of triangle-freeness. Thus indeed the bits (xj : j £ Sf fl G) are uniform and mutually inde¬ 
pendent. The remaining coordinates Si n F are also disjoint from all these (Aj)j e $ e nG> by the "no 
overlapping constraints" part of the triangle-free property. Thus the remaining bits (x, : i £ S /. fl F) 
are uniform, independent, and independent of the bits (xj : / G S/ fl G), completing the proof of 
the claim. □ 


An immediate corollary of the claim is that all inactive constraints. Pi contribute nothing, in 
expectation, to E[^3(x)]. Thus it suffices to consider the contribution of the active constraints. Our 
main goal will be to show that the bits xg can be chosen in such a way that 


V; G G 


E 


E Pt(xs t ) 

HeNj 


> exp (~Q(k)) ■ s/\Nj 


(5.1) 


and hence 

mix)] > 1 • exp(-0(*)) • E (5.2) 

jeG 

Given (5.2) it will be easy to complete the proof of the theorem by choosing the partition (F, G) 
randomly. 
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So towards showing (5.1), fix any; G G. For each t G Nj we can write Pf(xs ( ) = XjQt i x s e \{j }) + 
R(( x S e \{j}), where Qf = 8 ;Pf = djPf. Since the bits x, for i G Sf \ {/} C F are chosen uniformly 
and independently, the expected contribution to (5.1) from the Rt polynomials is 0. Thus we just 
need to establish 


E 


• E Qe 

ieNj 


> exp(-0(fc)) • yj\Nj 


where Q t = 


(5.3) 


We now finally describe how the algorithm chooses the random bit x;. Naturally, we will choose 
it to be +1 when Y^ieNj Qe ' s “large" and —1 otherwise. Doing this satisfies the second aspect of 
property (t), that x; should depend only on (x, : i G Aj). To satisfy the first aspect of property (t), 
that Xj is equally likely ±1, we are essentially forced to define 

xj = sgn( £ Q c - 0j V (5.4) 


where 6j is defined to be a median of the random variable E -eeNj Qe- 

(Actually, we have to be a little careful about this definition. For one thing, if the median 
6j is sometimes achieved by the random variable, we would have to carefully define sgn(0) to 
be sometimes +1 and sometimes —1 so that x; is equally likely ±1. For another thing, we are 
assuming here that the algorithm can efficiently compute the medians 6j. We will describe how to 
handle these issues in a technical remark after the proof.) 

Having described the definition (5.4) of X; satisfying property (t), it remains to verify the 
inequality (5.3). Notice that by the "no overlapping constraints" aspect of triangle-freeness, the 
random variables Qi are actually mutually independent. Further, Lemma 2.2 implies that each has 

variance Q(2 k ); hence the variance of Q = f E eeNj Qe is exp(— 0(k)) ■ |Ny|. Thus inequality (5.3) 
is equivalent to 


E[sgn(Q - 9j)Q\ > exp (~0(k)) ■ stddev[Q] = exp(-0(k)) • stddev[Q - dj\. 


Now 

E[sgn(Q - 9j)Q\ = E[sgn(Q - 0;)(Q - 9j + 0;)] = E[| Q- 9j |] + E[x ; - • 9j}. (5.5) 

We have E[x; • 9j ] = 0 since E[x;] = 0. And as for E[|Q — 9j |], it is indeed at least exp(— 0(k)) ■ 
stddevfQ] by Fact 2.3, since Q is a degree- (k — 1) function of uniform and independent random 
bits. Thus we have finally established (5.1), and therefore (5.2). 

To conclude, we analyze what happens when the algorithm initially chooses a uniformly ran¬ 
dom partition (F, G) of [n]. In light of (5.2), it suffices to show that for each i G [n] we have 


E 



> exp(—O(fc)) • yj deg(z'). 


(5.6) 


We have P [i G G] — conditioning on this event, let us consider the random variable |N,|; i.e., 
the number of active constraints involving variable x,. A constraint scope Sf containing i becomes 
active if and only if all the other indices in Sf go into F, an event that occurs with probability 2~ k+1 
(at least). Furthermore, these events are independent across the scopes containing i because of the 
"no overlapping constraints" property of triangle-freeness. Thus (conditioned on i G G), each 
random variable |N, | is the sum A\ + • • • + A deg (,) independent indicator random variables, each 
with expectation at least 2 ^ 11 . Thus we indeed have IE[\/1 N, |] > exp(—0(k))i/deg(z) as needed 
to complete the proof of (5.6). This follows from the well known fact that E [ \J Binomial (d,p)] > 
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Q(min(y flp,dp)). (Alternatively, this follows from the fact that A\ + ■ ■ • + A dj is at least its ex¬ 
pectation di2~ k+l with probability at least exp(— 0(k)), by Fact 2.3. Here we would use that the 
Ay's are degr ee-(k — 1) functions of independent random bits defining (F, G)). The proof is com¬ 
plete. □ 

Remark 5.2. Regarding the issue of algorithmically obtaining the medians in the above proof: In 
fact, we claim it is unnecessary for the algorithm to compute the median 9j of each Q ( precisely. 
Instead, our algorithm will (with high probability) compute a number 9j and a probabilistic way 
of defining sgn(O) € {±1} such that, when Xj is defined to be sgn(Q — 9j), we have |E[ *;]| < B' 
where 5 = 1/ poly(m, n,exp(k)) is sufficiently small. First, let us briefly say why this is sufficient. 
The above proof relied on E[*y] = 0 in two places. One place was in the last term of (5.5), where 
we used E [xj ■ 9j } = 0. Now in the approximate case, we'll have | E [xj • 9f] \ < 5m, and by taking 
5 appropriately small this will contribute negligibly to the overall theorem. The other place that 
E[xy] = 0 was used was in deducing from Claim 5.1.1, that the inactive constraints contributed 
nothing to the algorithm's expected value. When we merely have | E [ *;]| < 5 (but still have the 
independence used in the claim), it's easy to see from Fourier considerations that each inactive 
constraint still contributes at most 2 k 5 to the overall expectation, and again this is negligible for 
the theorem as a whole if A = 1/ poly(m, n, exp (k) ) is sufficiently small. Finally, it is not hard 
to show that the algorithm can compute an appropriate 9j and probabilistic definition of sgn(0) 
in poly(m, n, exp(fc)) time (with high probability), just by sampling to find a good approximate 
median 9j and then also estimating P[Qy = 9j\ to handle the definition of sgn(0). 
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